hardware acceleration
Assistax: A Hardware-Accelerated Reinforcement Learning Benchmark for Assistive Robotics
Hinckeldey, Leonard, Fosong, Elliot, Miller, Elle, Rubavicius, Rimvydas, McInroe, Trevor, Wollstadt, Patricia, Wiebel-Herboth, Christiane B., Ramamoorthy, Subramanian, Albrecht, Stefano V.
The development of reinforcement learning (RL) algorithms has been largely driven by ambitious challenge tasks and benchmarks. Games have dominated RL benchmarks because they present relevant challenges, are inexpensive to run and easy to understand. While games such as Go and Atari have led to many breakthroughs, they often do not directly translate to real-world embodied applications. In recognising the need to diversify RL benchmarks and addressing complexities that arise in embodied interaction scenarios, we introduce Assistax: an open-source benchmark designed to address challenges arising in assistive robotics tasks. Assistax uses JAX's hardware acceleration for significant speed-ups for learning in physics-based simulations. In terms of open-loop wall-clock time, Assistax runs up to $370\times$ faster when vectorising training runs compared to CPU-based alternatives. Assistax conceptualises the interaction between an assistive robot and an active human patient using multi-agent RL to train a population of diverse partner agents against which an embodied robotic agent's zero-shot coordination capabilities can be tested. Extensive evaluation and hyperparameter tuning for popular continuous control RL and MARL algorithms provide reliable baselines and establish Assistax as a practical benchmark for advancing RL research for assistive robotics. The code is available at: https://github.com/assistive-autonomy/assistax.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Hardware Acceleration for Real-Time Wildfire Detection Onboard Drone Networks
Briley, Austin, Afghah, Fatemeh
Early wildfire detection in remote and forest areas is crucial for minimizing devastation and preserving ecosystems. Autonomous drones offer agile access to remote, challenging terrains, equipped with advanced imaging technology that delivers both high-temporal and detailed spatial resolution, making them valuable assets in the early detection and monitoring of wildfires. However, the limited computation and battery resources of Unmanned Aerial Vehicles (UAVs) pose significant challenges in implementing robust and efficient image classification models. Current works in this domain often operate offline, emphasizing the need for solutions that can perform inference in real time, given the constraints of UAVs. To address these challenges, this paper aims to develop a real-time image classification and fire segmentation model. It presents a comprehensive investigation into hardware acceleration using the Jetson Nano P3450 and the implications of TensorRT, NVIDIA's high-performance deep-learning inference library, on fire classification accuracy and speed. The study includes implementations of Quantization Aware Training (QAT), Automatic Mixed Precision (AMP), and post-training mechanisms, comparing them against the latest baselines for fire segmentation and classification. All experiments utilize the FLAME dataset - an image dataset collected by low-altitude drones during a prescribed forest fire. This work contributes to the ongoing efforts to enable real-time, on-board wildfire detection capabilities for UAVs, addressing speed and the computational and energy constraints of these crucial monitoring systems. The results show a 13% increase in classification speed compared to similar models without hardware optimization. Comparatively, loss and accuracy are within 1.225% of the original values.
- Information Technology (0.55)
- Government (0.46)
- Media (0.37)
- (2 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Architecture > Real Time Systems (1.00)
AutonomROS: A ReconROS-based Autonomous Driving Unit
Lienen, Christian, Brede, Mathis, Karger, Daniel, Koch, Kevin, Logan, Dalisha, Mazur, Janet, Nowosad, Alexander Philipp, Schnelle, Alexander, Waizy, Mohness, Platzner, Marco
Autonomous driving has become an important research area in recent years, and the corresponding system creates an enormous demand for computations. Heterogeneous computing platforms such as systems-on-chip that combine CPUs with reprogrammable hardware offer both computational performance and flexibility and are thus interesting targets for autonomous driving architectures. The de-facto software architecture standard in robotics, including autonomous driving systems, is ROS 2. ReconROS is a framework for creating robotics applications that extends ROS 2 with the possibility of mapping compute-intense functions to hardware. This paper presents AutonomROS, an autonomous driving unit based on the ReconROS framework. AutonomROS serves as a blueprint for a larger robotics application developed with ReconROS and demonstrates its suitability and extendability. The application integrates the ROS 2 package Navigation 2 with custom-developed software and hardware-accelerated functions for point cloud generation, obstacle detection, and lane detection. In addition, we detail a new communication middleware for shared memory communication between software and hardware functions. We evaluate AutonomROS and show the advantage of hardware acceleration and the new communication middleware for improving turnaround times, achievable frame rates, and, most importantly, reducing CPU load.
- North America > United States > Florida > Orange County > Orlando (0.04)
- Europe > Germany (0.04)
- Africa > Mali (0.04)
- Transportation > Ground > Road (1.00)
- Information Technology > Robotics & Automation (1.00)
- Automobiles & Trucks (1.00)
RobotCore: An Open Architecture for Hardware Acceleration in ROS 2
Mayoral-Vilches, Víctor, Neuman, Sabrina M., Plancher, Brian, Reddi, Vijay Janapa
Hardware acceleration can revolutionize robotics, enabling new applications by speeding up robot response times while remaining power-efficient. However, the diversity of acceleration options makes it difficult for roboticists to easily deploy accelerated systems without expertise in each specific hardware platform. In this work, we address this challenge with RobotCore, an architecture to integrate hardware acceleration in the widely-used ROS 2 robotics software framework. This architecture is target-agnostic (supports edge, workstation, data center, or cloud targets) and accelerator-agnostic (supports both FPGAs and GPUs). It builds on top of the common ROS 2 build system and tools and is easily portable across different research and commercial solutions through a new firmware layer. We also leverage the Linux Tracing Toolkit next generation (LTTng) for low-overhead real-time tracing and benchmarking. To demonstrate the acceleration enabled by this architecture, we use it to deploy a ROS 2 perception computational graph on a CPU and FPGA. We employ our integrated tracing and benchmarking to analyze bottlenecks, uncovering insights that guide us to improve FPGA communication efficiency. In particular, we design an intra-FPGA ROS 2 node communication queue to enable faster data flows, and use it in conjunction with FPGA-accelerated nodes to achieve a 24.42% speedup over a CPU.
- Europe > Spain (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- South America > Ecuador (0.04)
- (5 more...)
- Information Technology > Software (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
Perceive Runs Transformers at the Edge with Second-Gen Chip - EE Times
Perceive, the AI chip startup spun out of Xperi, has released a second chip with hardware support for transformers, including large language models (LLMs) at the edge. The company demonstrated sentence completion via RoBERTa, a transformer network with 110 million parameters, on its Ergo 2 chip at CES 2023. Ergo 2 comes in the same 7mm x 7mm package as the original Ergo, but offers roughly 4 the performance. This performance increase translates to edge inference of transformers with more than 100 million parameters, video processing at higher frame rates or inference of multiple large neural networks at once. For example, the YoloV5-S inference can run at up to 115 inferences per second on Ergo 2; YoloV5-S inference at 30 images per second requires just 75 mW.
Intel VP talks AI strategy as company takes on Nvidia
Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Intel is on an artificial intelligence (AI) mission that it considers very, very possible. The company is the world's largest semiconductor chip manufacturer by revenue, and is best known for its CPU market dominance, with its familiar "Intel inside" campaign -- reminding us all what resided inside our personal computers. However, in an age when AI chips are all the rage, the company finds itself chasing competitors, most notably Nvidia, which has a massive head start in AI processing with its GPUs.
A Study on the Use of Edge TPUs for Eye Fundus Image Segmentation
Medical image segmentation can be implemented using Deep Learning methods with fast and efficient segmentation networks. Single-board computers (SBCs) are difficult to use to train deep networks due to their memory and processing limitations. Specific hardware such as Google's Edge TPU makes them suitable for real time predictions using complex pre-trained networks. In this work, we study the performance of two SBCs, with and without hardware acceleration for fundus image segmentation, though the conclusions of this study can be applied to the segmentation by deep neural networks of other types of medical images. To test the benefits of hardware acceleration, we use networks and datasets from a previous published work and generalize them by testing with a dataset with ultrasound thyroid images.
AI technologies used in Robotics
Robotics today is not the same as assembly line Robots of the industrial age because AI is impacting many areas of Robotics. At the AI labs, we have been exploring a few of these areas using the Dobot Magician Robotic Arm in London. Our work was originally inspired by this post from Google which used the Dobot Magician( build your own machine learning powered robot arm using TensorFlow ...). In essence, the demo allows you use voice commands to enable the robotic arm to pick up specific objects (ex a red domino). This demo uses multiple AI technologies.
Google Integrates TensorFlow Lite with Android, Adds Automatic Acceleration
Google has announced a new mobile ML stack, dubbed Android ML Platform and built around TensorFlow Lite, which aims to solve a number of problems that developers find when using on-device machine learning. The real announcement behind the Android ML Platform is that its foundation, TensorFlow Lite, will become available on all Android devices supporting Google Play Services. This means it will become part of the backbone that powers the Android platform. Goggle's and third-party apps will thus not need to bundle it anymore in their packages and developers can take the availability of its API for granted. This will reduce overall device storage usage, since TensorFlow Lite will be shared by all apps.
- Information Technology > Communications > Mobile (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
Hardware Acceleration of Deep Neural Network Models on FPGA (Part 2 of 2)
While Part 1 of this 2-part blog series covered Deep Neural Networks and the different accelerators for implementing Deep Neural Network Models, Part 2 will talk about different Deep Learning Frameworks and hardware frameworks provided by FPGA Vendors. Deep learning framework can be considered as a tool or library that helps us to build DNN models quickly and easily without any in-depth knowledge of the underlying algorithms. It provides a condensed way for defining the models using pre-built and optimized components. Some of the important deep learning frameworks are Caffe, TensorFlow, Pytorch, Keras, etc. Caffe is a deep neural network framework designed to improve speed and modularity. It is developed by Berkeley AI Research.